Primary exercises

  1. Manually created factor.
    In a study participants were asked whether their sport activity is none, oncePerWeek, severalPerWeek or daily.
    Build a proper factor for the responses below and store it in a variable w.
    Print the factor.
    Write the code to count the numbers of occurrences of each level and print the counts.
severalPerWeek, none, none, oncePerWeek, oncePerWeek, oncePerWeek, oncePerWeek, ?, none, none
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek   
 [6] oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
  1. A factor with a random content.
    Read help about the function sample.
    Then study and try the following lines of code to understand the results.
    Next, understand why an error is generated and use replace argument to generate a vector with 100 samples.
    Store this vector in a variable v and build a factor w from it.
    Finally, count the numbers of occurrences of each level in w.
    Ensure, that levels are in order provided in the variable lvs.
lvs <- c( "none", "oncePerWeek", "severalPerWeek", "daily" )
sample( lvs, 3 )
[1] "none"           "severalPerWeek" "oncePerWeek"   
sample( lvs, 3 )
[1] "none"           "severalPerWeek" "oncePerWeek"   
sample( lvs, 3 )
[1] "daily"          "oncePerWeek"    "severalPerWeek"
sample( lvs, 100 )
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
v <- sample( lvs, 100, replace = TRUE )
w <- factor( v, levels = lvs )
w
  [1] severalPerWeek daily          severalPerWeek severalPerWeek severalPerWeek
  [6] none           none           oncePerWeek    oncePerWeek    oncePerWeek   
 [11] daily          none           oncePerWeek    oncePerWeek    none          
 [16] daily          severalPerWeek oncePerWeek    none           severalPerWeek
 [21] none           oncePerWeek    severalPerWeek severalPerWeek none          
 [26] severalPerWeek oncePerWeek    none           none           oncePerWeek   
 [31] severalPerWeek daily          none           none           oncePerWeek   
 [36] none           oncePerWeek    none           severalPerWeek severalPerWeek
 [41] none           daily          none           severalPerWeek oncePerWeek   
 [46] severalPerWeek none           none           oncePerWeek    none          
 [51] oncePerWeek    daily          oncePerWeek    oncePerWeek    oncePerWeek   
 [56] oncePerWeek    oncePerWeek    severalPerWeek none           severalPerWeek
 [61] none           daily          oncePerWeek    oncePerWeek    severalPerWeek
 [66] oncePerWeek    daily          none           severalPerWeek daily         
 [71] oncePerWeek    severalPerWeek severalPerWeek none           severalPerWeek
 [76] daily          oncePerWeek    oncePerWeek    severalPerWeek oncePerWeek   
 [81] daily          severalPerWeek oncePerWeek    severalPerWeek daily         
 [86] oncePerWeek    severalPerWeek oncePerWeek    severalPerWeek none          
 [91] daily          none           oncePerWeek    daily          severalPerWeek
 [96] daily          oncePerWeek    daily          oncePerWeek    severalPerWeek
Levels: none oncePerWeek severalPerWeek daily
fct_count( w )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 none              24
2 oncePerWeek       32
3 severalPerWeek    28
4 daily             16
  1. Reordering factor levels.
    When a factor is shown on an axis of a plot, the order is given by its levels.
    The factor w from the previous exercise will be then shown in this order: none, oncePerWeek, severalPerWeek, daily.
    But for a picture in a manuscript the following order might be needed: daily, severalPerWeek, oncePerWeek, none.
    Apply to w one of the fct_ functions from the tidyverse library to produce a factor w2 with the requested order.
    Show the levels of w2.
    Again show the number of elements of each level in w2 and compare it with the table of the previous exercise.
w2 <- fct_relevel( w, c( "daily", "severalPerWeek", "oncePerWeek", "none" ) )
levels( w2 )
[1] "daily"          "severalPerWeek" "oncePerWeek"    "none"          
fct_count( w2 )
# A tibble: 4 × 2
  f                  n
  <fct>          <int>
1 daily             16
2 severalPerWeek    28
3 oncePerWeek       32
4 none              24

Extra exercises

  1. Counting with table(); getting counts for single levels.
    The fct_count() is a tidyverse/forcats function for counting factor elements and produces the result in a form of a table (the tibble object).
    The table() function from base-R provides a similar functionality but returns the result in another format.
    Reuse the factor w from the first primary exercise.
    Try table( w ) and compare its output with fct_count( w ).
    Store the counts as follows cnts <- table( w ). Use square brackets on cnts to get the count of oncePerWeek.
v <- c( "severalPerWeek", "none", "none", "oncePerWeek", "oncePerWeek", "oncePerWeek", "oncePerWeek", NA, "none", "none" )
w <- factor( v, levels = c( "none", "oncePerWeek", "severalPerWeek", "daily" ) )
w
 [1] severalPerWeek none           none           oncePerWeek    oncePerWeek   
 [6] oncePerWeek    oncePerWeek    <NA>           none           none          
Levels: none oncePerWeek severalPerWeek daily
table( w )
w
          none    oncePerWeek severalPerWeek          daily 
             4              4              1              0 
fct_count( w )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
cnts <- table( w )
cnts[ "oncePerWeek" ]
oncePerWeek 
          4 
  1. Special ordering of levels.
    ➡️Go to forcats cheat sheet to find how to order the factor by the frequency of occurrences.
    Reuse w from the previous exercise and construct a factor w3 with the same values and with the levels sorted by descending number of occurrences.
    Count the occurrences to demonstrate correctness.
    Now, find a way to sort the levels in the increasing order.
w3 <- fct_infreq( w )
fct_count( w3 )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 none               4
2 oncePerWeek        4
3 severalPerWeek     1
4 daily              0
5 <NA>               1
fct_count( fct_rev( w3 ) )
# A tibble: 5 × 2
  f                  n
  <fct>          <int>
1 daily              0
2 severalPerWeek     1
3 oncePerWeek        4
4 none               4
5 <NA>               1


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC